Improved K-Modes for Categorical Clustering Using Weighted Dissimilarity Measure

نویسندگان

  • S. Aranganayagi
  • K. Thangavel
چکیده

K-Modes is an extension of K-Means clustering algorithm, developed to cluster the categorical data, where the mean is replaced by the mode. The similarity measure proposed by Huang is the simple matching or mismatching measure. Weight of attribute values contribute much in clustering; thus in this paper we propose a new weighted dissimilarity measure for K-Modes, based on the ratio of frequency of attribute values in the cluster and in the data set. The new weighted measure is experimented with the data sets obtained from the UCI data repository. The results are compared with K-Modes and K-representative, which show that the new measure generates clusters with high purity. Keywords—Clustering, categorical data, K-Modes, weighted dissimilarity measure

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A dissimilarity measure for the k-Modes clustering algorithm

Clustering is one of the most important data mining techniques that partitions data according to some similarity criterion. The problems of clustering categorical data have attracted much attention from the data mining research community recently. As the extension of the k-Means algorithm, the k-Modes algorithm has been widely applied to categorical data clustering by replacing means with modes...

متن کامل

A fuzzy k-modes algorithm for clustering categorical data

This correspondence describes extensions to the fuzzy k-means algorithm for clustering categorical data. By using a simple matching dissimilarity measure for categorical objects and modes instead of means for clusters, a new approach is developed, which allows the use of the k-means paradigm to efficiently cluster large categorical data sets. A fuzzy k-modes algorithm is presented and the effec...

متن کامل

A Global-Relationship Dissimilarity Measure for the k-Modes Clustering Algorithm

The k-modes clustering algorithm has been widely used to cluster categorical data. In this paper, we firstly analyzed the k-modes algorithm and its dissimilarity measure. Based on this, we then proposed a novel dissimilarity measure, which is named as GRD. GRD considers not only the relationships between the object and all cluster modes but also the differences of different attributes. Finally ...

متن کامل

Genetic Distance Measure for K-modes Algorithm

K-means algorithm has been shown to be an effective and efficient algorithm for clustering. However, the k-means algorithm is developed for numerical data only. It is not suitable for the clustering of non-numerical data. K-modes algorithm has been developed for clustering categorical objects by extending from the k-means algorithm. However, no one applies this technique for classification of c...

متن کامل

Improving K-Modes Algorithm Considering Frequencies of Attribute Values in Mode

The original k-means algorithm is designed to work primarily on numeric data sets. This prohibits the algorithm from being applied to categorical data clustering, which is an integral part of data mining and has attracted much attention recently. The k-modes algorithm extended the k-means paradigm to cluster categorical data by using a frequency-based method to update the cluster modes versus t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009